selection bia
Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference
Arruda, Jonas, Chervet, Sophie, Staudt, Paula, Wieser, Andreas, Hoelscher, Michael, Sermet-Gaudelus, Isabelle, Binder, Nadine, Opatowski, Lulla, Hasenauer, Jan
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (6 more...)
- Information Technology > Enterprise Applications > Customer Relationship Management (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Spurious Predictability in Financial Machine Learning
Adaptive specification search generates statistically significant backtests even under martingale-difference nulls. We introduce a falsification audit testing complete predictive workflows against synthetic reference classes, including zero-predictability environments and microstructure placebos. Workflows generating significant walk-forward evidence in these environments are falsified. For passing workflows, we quantify selection-induced performance inflation using an absolute magnitude gap linking optimized in-sample evidence to disjoint walk-forward realizations, adjusted for effective multiplicity. Simulations validate extreme-value scaling under correlated searches and demonstrate detection power under genuine structure. Empirical case studies confirm that many apparent findings represent methodological artifacts rather than genuine predictability.
- Europe > Greece (0.40)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Banking & Finance (1.00)
- Health & Medicine (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Complete Causal Identification from Ancestral Graphs under Selection Bias
Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (4 more...)
- Banking & Finance > Credit (0.48)
- Health & Medicine (0.47)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
- Information Technology (0.67)
- Health & Medicine (0.46)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (14 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)